Goto

Collaborating Authors

 Sheridan County


VocSim: A Training-free Benchmark for Zero-shot Content Identity in Single-source Audio

Basha, Maris, Zai, Anja, Stoll, Sabine, Hahnloser, Richard

arXiv.org Artificial Intelligence

General-purpose audio representations aim to map acoustically variable instances of the same event to nearby points, resolving content identity in a zero-shot setting. Unlike supervised classification benchmarks that measure adaptability via parameter updates, we introduce VocSim, a training-free benchmark probing the intrinsic geometric alignment of frozen embeddings. VocSim aggregates 125k single-source clips from 19 corpora spanning human speech, animal vocalizations, and environmental sounds. By restricting to single-source audio, we isolate content representation from the confound of source separation. We evaluate embeddings using Precision@k for local purity and the Global Separation Rate (GSR) for point-wise class separation. To calibrate GSR, we report lift over an empirical permutation baseline. Across diverse foundation models, a simple pipeline, frozen Whisper encoder features, time-frequency pooling, and label-free PCA, yields strong zero-shot performance. However, VocSim also uncovers a consistent generalization gap. On blind, low-resource speech, local retrieval drops sharply. While performance remains statistically distinguishable from chance, the absolute geometric structure collapses, indicating a failure to generalize to unseen phonotactics. As external validation, our top embeddings predict avian perceptual similarity, improve bioacoustic classification, and achieve state-of-the-art results on the HEAR benchmark. We posit that the intrinsic geometric quality measured here proxies utility in unlisted downstream applications. We release data, code, and a public leaderboard to standardize the evaluation of intrinsic audio geometry.


Detecting and explaining postpartum depression in real-time with generative artificial intelligence

García-Méndez, Silvia, de Arriba-Pérez, Francisco

arXiv.org Artificial Intelligence

Among the many challenges mothers undergo after childbirth, postpartum depression ( ppd) is a severe condition that significantly impacts their mental and physical well-being. Consequently, the rapid detection of ppd and their associated risk factors is critical for in-time assessment and intervention through specialized prevention procedures. Accordingly, this work addresses the need to help practitioners make decisions with the latest technological advancements to enable real-time screening and treatment recommendations. Mainly, our work contributes to an intelligent ppd screening system that combines Natural Language Processing, Machine Learning ( ml), and Large Language Models ( llm s) towards an affordable, real-time, and non-invasive free speech analysis. Moreover, it addresses the black box problem since the predictions are described to the end users thanks to the combination of llm s with interpretable ml models ( i.e., tree-based algorithms) using feature importance and natural language. The results obtained are 90 % on ppd detection for all evaluation metrics, outperforming the competing solutions in the literature. Ultimately, our solution contributes to the rapid detection of ppd and their associated risk factors, critical for in-time and proper assessment and intervention. Introduction Depression is a global public health concern that affects more than 150 million people, being more prevalent in women (Labaka et al., 2018; Moreira et al., 2019). Among the many challenges mothers undergo after childbirth, postpartum depression ( ppd) is a severe condition that usually requires medical intervention (Falana & Carrington, 2019). Mainly, ppd is a common non-psychotic mental disorder during the first year after childbirth that can lead to severe complications in the women's health (Abadiga, 2019). Current data indicates that between 10 % to 15 % of mothers worldwide are affected with ppd yearly (Fatima et al., 2019; Liu et al., 2023). Moreover, only 20% of the target population is diagnosed or even treated promptly (Mazumder & Baruah, 2021).


Is Machine Learning Unsafe and Irresponsible in Social Sciences? Paradoxes and Reconsidering from Recidivism Prediction Tasks

Liu, Jianhong, Li, Dianshi

arXiv.org Artificial Intelligence

Initially, those scholars employ these historical elements to forecast whether the criminal would re-offend. Subsequently, the binary outcome of recidivism serves as a proxy variable for recidivism risk. Some computer scientists also employ the probability (or score) assigned by the model for an offender's likelihood of re-offense as a gauge for their recidivism risk (Etzler et al., 2023; Ma et al., 2022; Wang et al., 2022). While such configurations may seem intuitively compelling, they often embody an oversimplified and deterministic viewpoint, which stands in contradiction to contemporary social science theories. Firstly, historical factors alone are insufficient predictors of human actions.


Robot navigation and target capturing using nature-inspired approaches in a dynamic environment

Verma, Devansh, Saxena, Priyansh, Tiwari, Ritu

arXiv.org Artificial Intelligence

Path Planning and target searching in a three-dimensional environment is a challenging task in the field of robotics. It is an optimization problem as the path from source to destination has to be optimal. This paper aims to generate a collision-free trajectory in a dynamic environment. The path planning problem has sought to be of extreme importance in the military, search and rescue missions and in life-saving tasks. During its operation, the unmanned air vehicle operates in a hostile environment, and faster replanning is needed to reach the target as optimally as possible. This paper presents a novel approach of hierarchical planning using multiresolution abstract levels for faster replanning. Economic constraints like path length, total path planning time and the number of turns are taken into consideration that mandate the use of cost functions. Experimental results show that the hierarchical version of GSO gives better performance compared to the BBO, IWO and their hierarchical versions.